AITopics | difference estimator

Collaborating Authors

difference estimator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

PPI is the Difference Estimator: Recognizing the Survey Sampling Roots of Prediction-Powered Inference

Mozer, Reagan

arXiv.org Machine LearningMar-20-2026

Prediction-powered inference (PPI) is a rapidly growing framework for combining machine learning predictions with a small set of gold-standard labels to conduct valid statistical inference. In this article, I argue that the core estimators underlying PPI are equivalent to well-established estimators from the survey sampling literature dating back to the 1970s. Specifically, the PPI estimator for a population mean is algebraically equivalent to the difference estimator of Cassel et al. (1976), and PPI plus corresponds to the generalized regression (GREG) estimator of Sarndal et al. (2003). Recognizing this equivalence, I consider what part of PPI is inherited from a long-standing literature in statistics, what part is genuinely new, and where inferential claims require care. After introducing the two frameworks and establishing their equivalence, I break down where PPI diverges from model-assisted estimation, including differences in the mode of inference, the role of the unlabeled data pool, and the consequences of differential prediction error for subgroup estimands such as the average treatment effect. I then identify what each framework offers the other: PPI researchers can draw on the survey sampling literature's well-developed theory of calibration, optimal allocation, and design-based diagnostics, while survey sampling researchers can benefit from PPI's extensions to non-standard estimands and its accessible software ecosystem. The article closes with a call for integration between these two communities, motivated by the growing use of large language models as measurement instruments in applied research.

large language model, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

2603.1916

Country:

North America > United States > New York (0.05)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.36)

Add feedback

Practical Improvements of A/B Testing with Off-Policy Estimation

Sakhi, Otmane, Gilotte, Alexandre, Rohde, David

arXiv.org Machine LearningJun-16-2025

We address the problem of A/B testing, a widely used protocol for evaluating the potential improvement achieved by a new decision system compared to a baseline. This protocol segments the population into two subgroups, each exposed to a version of the system and estimates the improvement as the difference between the measured effects. In this work, we demonstrate that the commonly used difference-in-means estimator, while unbiased, can be improved. We introduce a family of unbiased off-policy estimators that achieves lower variance than the standard approach. Among this family, we identify the estimator with the lowest variance. The resulting estimator is simple, and offers substantial variance reduction when the two tested systems exhibit similarities. Our theoretical analysis and experimental results validate the effectiveness and practicality of the proposed method.

artificial intelligence, data mining, machine learning, (18 more...)

arXiv.org Machine Learning

2506.10677

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Data Science > Data Mining (0.69)

Add feedback

A Framework for Adversarial Streaming via Differential Privacy and Difference Estimators

Attias, Idan, Cohen, Edith, Shechner, Moshe, Stemmer, Uri

arXiv.org Artificial IntelligenceSep-26-2022

Streaming algorithms are algorithms for processing large data streams while using only a limited amount of memory, significantly smaller than what is needed to store the entire data stream. Data streams occur in many applications including computer networking, databases, and natural language processing. The seminal work of Alon, Matias, and Szegedy[AMS99] initiated an extensive theoretical study and further applications of streaming algorithms. In this work we focus on streaming algorithms that aim to maintain, at any point in time, an approximation for the value of some (predefined) real-valued function of the input stream. Such streaming algorithms are sometimes referred to as strong trackers. For example, this predefined function might count the number of distinct elements in the stream.

artificial intelligence, estimator, natural language, (14 more...)

arXiv.org Artificial Intelligence

2107.14527

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > Nevada > Clark County > Las Vegas (0.04)
(6 more...)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback

Scalable MCMC for Large Data Problems using Data Subsampling and the Difference Estimator

Quiroz, Matias, Villani, Mattias, Kohn, Robert

arXiv.org Machine LearningAug-1-2017

We propose a generic Markov Chain Monte Carlo (MCMC) algorithm to speed up computations for datasets with many observations. A key feature of our approach is the use of the highly efficient difference estimator from the survey sampling literature to estimate the log-likelihood accurately using only a small fraction of the data. Our algorithm improves on the $O(n)$ complexity of regular MCMC by operating over local data clusters instead of the full sample when computing the likelihood. The likelihood estimate is used in a Pseudo-marginal framework to sample from a perturbed posterior which is within $O(m^{-1/2})$ of the true posterior, where $m$ is the subsample size. The method is applied to a logistic regression model to predict firm bankruptcy for a large data set. We document a significant speed up in comparison to the standard MCMC on the full dataset.

artificial intelligence, difference estimator, machine learning, (3 more...)

arXiv.org Machine Learning

1507.02971

Genre: Research Report (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.53)

Add feedback